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The space of probability distributions on a given sample space possesses natural 
geometric properties. For example, in the case of a smooth parametric family 

■ of probability distributions on the real line, the parameter space has a Rieman- 
\ nian structure induced by the embedding of the family into the Hilbert space of 

. ■ square-integrable functions, and is characterised by the Fisher-Rao metric. In the 

^ ^ \ nonparametric case the relevant geometry is determined by the spherical distance 

■ function of Bhattacharyya. In the context of term structure modelling, we show 
qh| that minus the derivative of the discount function with respect to the maturity 

date gives rise to a probability density. This follows as a consequence of the pos- 
itivity of interest rates. Therefore, by mapping the density functions associated 
^ . with a given family of term structures to Hilbert space, the resulting metrical 

f"--. \ geometry can be used to analyse the relationship of yield curves to one another. 

•/^ . We show that the general arbitrage-free yield curve dynamics can be represented 

I as a process taking values in the convex space of smooth density functions on 

the positive real line. It follows that the theory of interest rate dynamics can be 
represented by a class of processes in Hilbert space. We also derive the dynamics 
for the central moments associated with the distribution determined by the yield 
curve. (26 June 2000) 
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1. Introduction 

The theory of interest rates has gone through two major developments in re- 
cent decades. Following initial investigations by Merton (1973) and others, the 
first decisive advance culminated in the work of Vasicek (1977) who was able to 
give a fairly general characterisation of the arbitrage-free dynamics of a family 
of discount bonds, indexed by their maturity. The well-known model that bears 
his name appears as an exact solution obtained with specialising assumptions. 
In the wake of Vasicek's work were a number of other specific interest rate mod- 
els, of varying degrees of usefulness and tractability, including, for example, the 
CIR model (Cox et al. 1985) and its generalisations. The next significant line of 
development, following the general martingale characterisation of arbitrage- free 
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asset pricing by Harrison & Kreps (1979) and Harrison & Pliska (1981), was in- 
stigated with the recognition by Ho & Lee (1986) that the initial term structure 
might be specified essentially arbitrarily, a feature that has important practical 
implications. This insight was incorporated into the HJM framework (Heath et 
al. 1992), which constituted a major advance in the subject, providing a general 
model-independent basis for the analysis of interest rate dynamics and the pricing 
of interest rate derivatives. 

Since then there have been numerous further developments. These include, 
for example, the infinite dimensional or 'string-type' models of Kennedy (1994), 
Santa-Clara & Sornett (1997) and others, the positive interest rate models of Fle- 
saker & Hughston (1996), the potential approach of Rogers (1997), the so-called 
market models (Brace et al. 1996, 1997; Jamshidian 1997), and the geometric 
analysis of the space of yield curves undertaken by Bjork &: Svensson (1999). 

Nevertheless, no criterion has emerged, based on the extensive econometric 
evidence available, that allows in a rational way for the identification of a clearly 
preferred class of models. On these grounds it makes sense to try to cast the 
general interest rate framework into a new form, with the idea that certain models 
might thus become recognisable as more natural on mathematical and economic 
grounds. 

With this end in mind, the purpose of the present article is to propose a novel 
application of information geometry to interest rate theory. The main results are 
(i) the construction of a geometric measure for how 'different' two term structures 
are from one another; (ii) a characterisation of the evolutionary trajectory of the 
term structure as a measure-valued process; (iii) the derivation of dynamics for 
the principal moments of the term structure; and (iv) a reformulation of arbitrage- 
free interest rate dynamics in terms of a class of processes on Hilbert space. 

The paper is organised as follows. In §2 we review the basic idea of informa- 
tion geometry and its role in estimation theory. The geometry of the normal 
distribution is considered in detail as an illustration. In §3 a remarkable char- 
acterisation of the discount function in terms of an abstract probability density 
function is introduced in Proposition 1. This allows us to apply information ge- 
ometric techniques to determine the deviation between different term structures 
within a given model. In this connection, in §4 we consider a class of flat rate 
models as examples. 

The material of the first four sections of the paper is essentially static, i.e., set in 
the present, whereas in §5 we investigate the dynamics of the density function that 
generates the term structure. This is carried out in such a way that the resulting 
dynamics is manifestly arbitrage-free. Our key result here is formula (5.15), in 
which we establish that the dynamics of the term structure can be characterised 
as a measure-valued process. This idea is developed further in Proposition 2. 

In §6 we introduce an analogue of the classical principal components analysis 
for yield curves, and in Propositions 3 and 4 wc derive formulae for the evolution 
of the first two moments of the term structure density process. Then, making 
use of the information geometry developed earlier, in §7 we map the dynamics 
developed in §5 to Hilbert space. Our main result here is Proposition 5, which 
shows how this can be achieved. 



Proc. R. Soc. Land. A (2001) 



Interest Rates and Information Geometry 3 
2. Information geometry 

Because some of the mathematical techniques we employ here may not be fa- 
miliar to those working in finance, it will be appropriate to begin with a few 
background remarks. It has long been known (see, e.g., Amari 1985; Kass 1989; 
Murray & Rice 1993) that a useful approach to statistical inference is to regard a 
parametric model as a differentiable manifold equipped with a metric. The recog- 
nition that a parametric family of probability distributions has a natural geometry 
associated with it arose in the work of Mahalanobis (1936), Bhattacharyya (1943) 
and Rao (1945) over half of a century ago. 

Suppose, for example, that X is a continuous random variable taking values 
on the real line R^, and that p{x) is a density function for X. Because p{x) 
is nonnegative and has integral unity, it follows that the square-root likelihood 
function 

ax) = (2.1) 

exists for all x, and satisfies the normalisation condition 

{ax)fdx = 1. (2.2) 



We see that ^(x) can be regarded as a unit vector in the Hilbert space T-L = 
L^(R^). Now let pi{x),p2{x) denote a pair of density functions on R^, and 
^i{x),l^2{x) the corresponding Hilbert space elements. Then the inner product 

/oo 
ii{x)^2{x)dx (2.3) 
-oo 

defines an angle (j) which can be interpreted as the distance between the two prob- 
ability distributions. More precisely, if we write S for the unit sphere in 7i, then 
(p is the spherical distance between the points on S determined by the vectors 
^i(x) and ^2(2;). The maximum possible distance, corresponding to nonoverlap- 
ping densities, is given by = tt/2. This follows from the fact that S,i{x) and 
^2(3;) are nonnegative functions, and thus define points on the positive orthant 
of S. We remark that an alternative way of expressing (2.3) is 

1 

cos<^ = 1 - - / ((ei(x) - Ux)fdx, (2.4) 

which makes it apparent that the angle (p measures the extent to which the two 
distributions are distinct. 

The spherical distance of Bhattacharyya introduced above is applicable in a 
nonparametric context. In the case of a parametric family of probability distribu- 
tions we can develop matters further. Let us write p{x, 0) for the parametcrised 
density function. Here 9 stands for a set of parameters (i = 1, • • • , r). By vary- 
ing 6 we obtain an r-dimensional submanifold M. in S determined by the unit 
vectors ^(a;,0) G H. The parameters 9^ are local coordinates for Ai. 

The key point that we require in the following (cf. Dawid 1977) is that the 
spherical geometry of S induces a Riemannian geometry on M, for which the 
metric tensor gij{9) is given, in local coordinates, by 

p dax,9) di{x,9) ^ 
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By use of definition (2.1), we see that an alternative expression for gijiO) is 

which shows (cf. Brody &; Hughston 1998) that the metric gij is, apart from 
the factor of |, the Fisher information matrix, i.e., the covariance matrix of the 
parametric gradient of the log-hkelihood function (Fisher 1921). We refer to gij{0) 
as the Fisher-Rao metric on the statistical model M. 

The significance of the Fisher-Rao metric in estimation theory is well known. 
Suppose that t{9) is some given function of the parameters, and that the random 
variable T represented by the function T(x) on is an unbiased estimator for 
t{9) in the sense that 



r 



p{x,d)T{x)dx = T{e). (2.7) 
The variance of the estimator T is defined, as usual, by 

/oo 
p{x,9){T{x) -T{e)fdx. (2.8) 
-oo 

Then a set of fundamental bounds on Var[r], independent of the choice of the 
estimator T{x), can be obtained by applying the operator a^di to (2.7), letting 
a* be arbitrary. By use of (2.1) and the Schwartz inequality for L^(R^), we obtain 

This matrix inequality is interpreted as saying that if we subtract the right side 
from the left, the result is nonnegative definite. It follows that if the random vari- 
ables 0' (i = 1, • • • ,r) are unbiased estimators for the parameters 6^, satisfying 

/oo 
p{x,e)e'{x)dx = 9\ (2.10) 
-oo 

then the covariance matrix of the estimators is bounded by the inverse Fisher 
information matrix: 

Cov[@\@^]>^g'^. (2.11) 

The Riemannian metric (2.5) introduced above can be used to define a distance 
measure between two distributions belonging to a given parametric family. This 
measure is invariant in the sense that it is unaffected by a reparameterisation 
of the distributions. The distance is calculated by integrating the infinitesimal 
line element ds along the geodesic connecting the two points in the statistical 
manifold M, where 

ds'^ = ^gijd0'd9^. (2.12) 

The geodesies with respect to a given metric gij are the solutions of the differential 
equation 

d'^e' de^de'' 
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Figure 1. Geodesic curves for normal distributions. The statistical manifold A4 in this case is 
the upper half plane parameterised by fi and a. We have —oo < /x < oo and < cr < oo. 
The shortest path joining the two normal distributions M{^i\, (Ti) and N{^,2,<^2) is given by the 
unique semi-circular arc through the given two points and centred on the boundary line a = Q. 



for the curve in subject to the given boundary conditions at the two 

end points. Here, we have written 

^/ {djOki + dkOji - digjk) , (2.14) 



r 



jk 

where di = d/d9^, and the inverse metric g^^ , also appearing in (2.11), satisfies 
9^^ 9jk = ^k-> where 5\ is the Kronecker delta. Note that in equations (2.13) and 
(2.14) above, and elsewhere henceforth in this article, we employ the standard 
Einstein summation convention on repeated indices. 

Let us consider, as an explicit example, the manifold M corresponding to the 
normal distributions N'{jjL,a) on R^, with mean |U and standard deviation a. For 
the parameterised density function we have 

p(x,^,cT) = ^^exp(-^^^-|^]. (2.15) 



27r(T 

A straightforward computation, making use of (2.6), gives 

ds^ = 4T(d^2_^2do-2) (2.16) 

for the line element, which is defined on the upper half-plane — oo < < oo, 
< fj < oo. The resulting Riemannian geometry is that of hyperbolic space, 
which is a homogeneous manifold with constant negative curvature. The geometry 
of this space has been studied extensively, and has many intriguing properties. 
For the distance function in the case of a pair of normal distributions AA(^i, cii), 
■^(A*2,cr2) we obtain 



D{pi,p2) = ^log:^ T—: 

V2 1 - 01,2 



where the function 5i^2, defined by 



Jl,2 



/ (/X2-m)^ + 2(a2-cTi)^ 

(/X2-/il)2+2(a2 + CTi)2' 



(2.17) 



(2.18) 
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Figure 2. The system of admissible term structures. A smooth positive interest term structure 
can be regarded as a point in T>{R,\), the convex space consisting of smooth density functions 
on R]|_. The points of T>{^\) are in one-to-one correspondence with rays lying in the positive 
orthant S+ of the unit sphere S in the Hilbert space H = L (R+). 



lies between and 1. The geodesies, in particular, are given in general by semi- 
circular arcs centred on the boundary line cr = (this line itself is not part of 
the manifold Ai). An exceptional situation arises when /ii = fi2, for which the 
geodesic is a straight line given by constant /x, and we have 

(2.19) 

We refer the reader to Burbea (1986), where metric and distance computations 
have been carried out explicitly for other families of distributions. 



D{pi,P2) = 



1 '^1 
log — 

0-2 



3. Discount bond densities 

Our goal now is to make use of the analysis presented in the previous section 
to construct a natural metric on the space of yield curves. In doing so we shall 
take advantage of a remarkable 'probabilistic' characterisation of discount bonds, 
which we here proceed to describe. 

Let t = Q denote the present, and Pqt a smooth family of discount bonds, 
where T is the maturity date (0 < T < oo). For positive interest we require 

< Pot < 1, ^i^oT < 0, (3.1) 

and we assume that Pqt — )• as T goes to infinity. A term structure that satisfies 
these conditions will be said to be 'admissible'. These conditions can, in fact, be 
relaxed slightly: Pqt need not be strictly smooth, nor strictly decreasing; but for 
most of the present discussion we shall stick with the assumptions indicated. 

The interesting point that arises here, of which we shall make extensive use in 
the discussions that follow, is that the discount function Pqt can be viewed as a 
complementary probability distribution. In other words, we think of the maturity 
date as an abstract random variable X, and for its distribution we write 

V[X<T] = l- Pot. (3.2) 

It should be clear that this can be done if and only if the positive interest rate 
conditions given in (3.1) hold. As a consequence we are able to embody the 
positive interest property in a fundamental way in the structure of the theory. 

Proc. R. Soc. Land. A (2001) 



Interest Rates and Information Geometry 



7 



Indeed, this basic economic property is essential if we wish to treat the yield curve 
consistently and naturally as a kind of mathematical object in its own right. Now 
let us introduce the function p{T) defined by 

PiT) = -QfPoT. (3.3) 

Clearly, we have p{T) > and 

/ p{T)dT = 1, (3.4) 
Jo 

from which we infer that p{T) can be consistently viewed as a probability den- 
sity function. It follows from the defining equation (3.3) that the term structure 
density p(T) is the product of the instantaneous forward rate and the discount 
function itself. Now clearly if pi (T) and p2 {T) are admissible term structure den- 
sities, and if A and B are nonnegative constants satisfying A + B = 1, then 
Api{T) + Bp2{T) is also an admissible term structure density. Putting these in- 
gredients together, we see that the term structure of interest rates can be given 
the following general characterisation. 

Proposition 1. The system of admissible term structures is isomorphic to the 
convex space D(R^) of smooth density functions on the positive real line. 

At first glance it may seem odd to think of the discount function in this manner. 
However, it gives us the advantage of being able to apply the tools of information 
geometry in an unexpected way, as we indicate in what follows. 

In particular, there is a one-to-one map from the space 'D(R|')_) of such term 
structure densities to the positive orthant 5+ of the unit sphere S in the Hilbert 
space 7i, as indicated in Figure 2. Therefore, given two yield curves we can calcu- 
late the distance between them. This can be carried out either in a nonparametric 
sense, by use of the Bhattacharyya spherical distance, or in a parametric sense, 
by use of the Fisher-Rao distance. In the former case first we calculate the cor- 
responding term structure densities pi (T) and p2 (T) . These are then mapped to 
(S+ by taking the square-roots, and their distance (f){pi,p2) is given by 

'P{P1,P2) = cos-1 ^ pi{T)p2{T)dT. (3.5) 

In the parametric case we regard the given parametric family of yield curves as 
defining a statistical model M. (Z S^, and the distance between the two yield 
curves within the given family is then defined by the Fisher-Rao metric. 

4. Flat term structures 

To provide some illustrations of the principles set forth in the previous section 
we consider here properties of yield curves for which the term structure is flat. 
Such yield curves, which are of various types, are on the whole too simple for use 
in practical modelling. Nevertheless, they are of interest as examples, because 
many of the relevant computations can be carried out explicitly. 

In this connection we begin by introducing a representation of the discount 
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function as a Laplace transform 

POO 

Poy= / e-'~^tp{r)dr (4.1) 



for some function ip{r). Thus wc think of the discount function Pqt as being 
given by a weighted superposition of elementary discount functions, each of the 
form e~^'^ for some value of r. Taking the limit T 0, we find that ■i/'(r) must 
satisfy /q°° ^{r)dr = 1. In general the inverse Laplace transform ijj{r) need not be 
positive. However, if we restrict our consideration to nonnegative functions, then 
'0(r) can be interpreted as a density function, and by various choices of tlj{r) we 
are led to some interesting candidates for term structures. 

First we consider the case where ilj{r) is a Dirac (5-function concentrated at a 
point, that is, tp{r) = 5{r — R). A direct substitution gives Pqt = exp(— ii!T), 
corresponding to a 'fiat' term structure with a continuously compounded rate 
R for each value of the maturity date T. If the density function ip{r) is given 
by an exponential distribution ■^(r) = Texp(— rr), with parameter r, then one 
sees that r must have dimensions of time, and a short calculation gives Pqt = 
r/(r + r), which also corresponds to a flat term structure, in this case with a 
simple percentage yield of for all maturities. We see that the characteristic 
time-scale r allows us to define an interest rate R = t~^, which turns out to 
be the characteristic interest rate of the resulting structure, and we can write 
-for = 1/(1 + RT) for the discount function. 

We note that flatness is not a completely unambiguous notion, because having a 
uniform continuously compounded yield for all maturities is not the same thing as 
having a uniform simple yield for all maturities. Both define plausible albeit quite 
distinct systems of discount bonds. This example illustrates how by superposing 
term structures of the elementary form exp(— i?T) for various maturities, we can 
obtain other reasonable looking and well behaved term structures. We mention 
one more example, which contains the previous two examples as special cases. 
Consider the standard gamma distribution, with parameters n and A, defined for 
nonnegative values of r by the density function 

V;(r) = -i-AV'^-iexp(-Ar). (4.2) 

In this case, we can verify that the resulting system of discount bonds is given by 

which assumes a more recognisable form if we set \ = kt, where r again defines 
a characteristic time scale, and k is a dimcnsionless number. Then we have 

1 + — j , (4.4) 

where R = t~^. The system of discount bonds arising here can also be interpreted 
as a fiat term structure, in this case with a constant annualised rate of interest 
R assuming compounding at the frequency k over the life of each bond (k need 
not be an integer). It is not difficult to check that for k = 1 this reduces to the 
case of a flat rate on the basis of a simple yield, whereas in the limit k — >■ oo we 
recover the case of a flat rate on the basis of continuous compounding. 
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Now we shall apply the ideas of statistical geometry to make comparisons 
between various term structures of the form (4.4). For density function p(T) = 
—OtPot in this case we obtain 

p{T,R)=R(^l + —j . (4.5) 

Here we find it convenient to label the density function by the flat rate R. Note 
that in the limit k — t- oo we have p{T, R) — t- Re'~^^ . First consider the nonpara- 
metric separation between different term structures in this model via spherical 
distance of Bhattacharyya given in formula (3.5), where in the present example 
we write pi{T) = p{T, i?j) for i = 1, 2. A direct integration leads to the expression 



0(pi,P2) = cos-(^^^log|j (4.6) 

for the distance when k = 1, whereas in the limit k — >■ oo (continuous compound- 
ing) we have 

0(pi,P2) = cos-(^^). (4.7) 

It is interesting to observe that the bracketed term in (4.7) is given by the ratio 
of the geometric and arithmetic means of the two rates. 

Alternatively, we can view (4.5) as a parametric family of distributions, param- 
eterised by the flat rate R. Then it is natural to consider the Fisher-Rao distance 
between the two term structures characterised by Ri and i?2- A straightforward 
calculation then leads to a simple distance formula given by 



D{R,,R2) = J ^log^^, (4.8) 



where we have assumed i?2 > -Ri- 



5. Interest rate dynamics 

The formalism we have developed so far is essentially a static one, set in the 
present. Now we turn to the problem of developing a dynamical theory of interest 
rates. The idea is that, at each instant of time, the yield curve is characterised 
by a term structure density according to the scheme described in the previous 
sections. Then, as time passes, the density function evolves randomly. As a conse- 
quence we obtain a measure- valued process. In particular, we obtain a process on 
P(R5,_). Our goal in this section is to determine a set of conditions on this process 
necessary and sufficient to ensure that the resulting interest rate dynamics will 
be arbitrage-free. 

We shall assume the reader is familiar with the general theory of interest rate 
dynamics as laid out, for example, in Carverhill (1994), Rogers (1994), Hughston 
(1996), Baxter (1997), Musicla & Rutkowski (1997), Brody (2000) or Hunt & 
Kennedy (2000). For the general discount bond dynamics, let us write 

dPtr = fitrdt + ^rr ■ dWt, (5.1) 
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where ^tT and T,tT are the absolute drift and absolute volatility processes, respec- 
tively, for a bond with maturity T. Here, Wt is a vector Brownian motion, and 
TifT is a vector process, and there is an inner product imphed between and 
dWt, signified by a dot. We need not specify the dimensionahty of the Brownian 
motion, which might be infinite, and indeed in some respects the infinite dimen- 
sional setting is the most natural one. In fact, it suffices for our purposes merely 
to assume that Pit is a one-parameter family of continuous semi-martingales on 
the given probability space, with respect to the given filtration. However, for 
simplicity of exposition we shall stick to the case where the relevant stochastic 
basis is generated by a multidimensional Brownian motion. Here, as in Flesaker & 
Hughston (1997a,b), we regard the discount bond dynamics as the natural start- 
ing position, rather than, say, the instantaneous forward rate dynamics (Heath et 
al. 1992), which we need not consider here directly. We shall assume nevertheless, 
as in the HJM framework, that the processes jitT and Tj^t are both smooth in 
the variable T, and that sufficiently strong technical conditions are in place to 
ensure that the instantaneous forward rate processes are semimartingales. 

In order to extend the analysis of the previous section it is convenient to in- 
troduce what is sometimes conveniently referred to as the 'Musiela parameteri- 
sation', given by 

where T = t + x represents the maturity date of the bond, and hence x is the time 
left until maturity. Thus Btx is the value at time t of a discount bond that has 
x years left to mature. This choice of parameterisation has already been shown 
to be useful in the geometric analysis of interest rates (Bjork & Svensson 1999, 
Bjork & Christensen 1999, Bjork & Gombani 1999, Bjork 2000). We note that 
Bto = 1 for all t, and that Btx — >■ as x ^ oo. It follows that 

Pt{x) = -Q^Btx (5.3) 

is a measure- valued process in the sense that, for each value of t the random 
function pt{x) satisfies pt{x) > and the normalisation condition 

/■oo 

/ pt{x)dx = 1. (5.4) 
Jo 

Here we have chosen the notation pt{x) that makes the x dependence more promi- 
nent, to emphasise the fact that, for each value of t, and conditional on infor- 
mation given up to time t, pt{x) is a density function, though we might have 
written ptx instead. As a consequence pt{x) describes a process on V{R\_). By 
consideration of (5.1) and (5.2) we deduce for the dynamics of Btx that 

d 

dBtx = dPtT\T=t+x + Q^^txdt, (5.5) 
and thus, by use of (5.1), that 

dBtx = iP't,t+x + dxBtx) dt + T.t,t+x ■ dWt, (5.6) 

where dx = d/dx. Differentiating this expression with respect to x and intro- 
ducing the measure- valued process pt{x) according to formula (5.3) we therefore 
obtain 

dptix) = {-dxixt,t+x + dxPtix)) dt - dx'^t,t+x ■ dW^. (5.7) 
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A further simplification is then achieved by introducing the notation 

Ptx = -dxlJ't,t+x (5.8) 

and 

i^tx = -dx'^t,t+x, (5.9) 

which gives us 

dpt{x) = {Ptx + d^pt{x)) dt + ut^ ■ dWf (5.10) 

In the foregoing discussion we have not yet imposed the arbitrage-free condi- 
tion. This is given by the drift constraint 

fitT = nPtT + ^tT ■ (5.11) 

where At is the process for the market price of risk. We note that At, hke EtT, is 
a vector process. However, At does not depend on the maturity T. The absence 
of arbitrage ensures the existence of At. For our purposes we do not need to insist 
that the bond market is complete: all we require is the existence of a pricing kernel, 
or equivalently the existence of a self-financing 'natural numeraire' portfolio with 
value process Nt such that Pit/Ni is a martingale for each value of T (cf. Flesaker 
& Hughston 1997c). The numeraire process satisfies 

dA^t 



iVt 



(rt + At")dt + At • dWt, (5.12) 



and the corresponding pricing kernel is given by l/Nf. As a consequence of the 
constraint (5.11) we then have 

fJ't,t+x = nBtx + St,t+j; • At, (5.13) 

and therefore, by differentiation of this expression with respect to x, we obtain 

Ptx = nptix) + utx ■ At. (5.14) 

Inserting (5.14) in (5.10) we are thus able to express the dynamics of the density 
function pt{x) in the form 

dpt{x) = {rtpt{x) + dxPt{x)) dt + ojtx ■ (dWt + Xtdt) . (5.15) 

Before proceeding further, let us verify, as a consistency check, that the dy- 
namics given by (5.15) preserves the normalisation condition on pt{x), given by 
(5.4). Integrating the right hand side of (5.15) with respect to x and equating the 
drift and volatility terms separately to zero leads to the relations 

roo 

n + / dxPt{x)dx = (5.16) 



and 



roo 

Jo ' 



a;ta;da; = 0, (5.17) 



which must hold for all t. Condition (5.16) is satisfied because pt{x) ^ as 
X oo and 

pt(0) = rt. (5.18) 
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Condition (5.17) is satisfied because, by definition, we have uJtx = —dx'^t,t+x, and 
the absolute volatility St,f+a; vanishes both as x — )• (a maturing bond has a 
definite value and thus has no absolute volatility), and as x ^ oo (a bond with 
infinite maturity has no value, and hence no absolute volatility). 

Summing up matters so far, we see that in (5.15) we are able to cut the standard 
HJM arbitrage-free interest rate dynamics in the form of a measure- valued process 
Ptix) subject to the constraints (5.16) and (5.17). At first glance the role of the 
short rate rt in (5.15) seems anomalous, because it might appear that this has to 
be specified separately. However, by virtue of (5.18) we can incorporate Vf directly 
into the dynamics of pt{x). 

In fact, there is another way of expressing (5.15) which is very suggestive, and 
ties in naturally with the Hilbcrt space approach to dynamics introduced in §7. 
First we note that (5.16) can be rewritten in the form 

POO 

= -/ pt{x)dxlnpt{x)dx. (5.19) 
Jo 

In other words, rt is minus the expectation of the gradient of the log-likelihood 
function. Here the expectation is taken with respect to pt{x) itself. Writing Ep 
for this abstract expectation, we have 

dpt{x) = ptix) {d^ \nptix) - Ep[d^ hiptix)]) dt + LUtx ■ dW*, (5.20) 

where dW* = dWt + A^dt. We note that W* has the interpretation of being a 
Brownian motion with respect to the risk-neutral measure associated with the 
given pricing kernel. In the risk-neutral measure, for which the term involving At 
effectively disappears, the remaining drift for pt{x) is determined by the deviation 
of dxlyiptix) from its abstract mean. 

Let us now examine more closely the volatility term cotx appearing in (5.20), 
with a view to gaining a better understanding of the significance of the volatility 
constraint (5.17). Because pt{x) must remain positive for all values of x, the 
coefficient of dW^ in (5.20) must be of the form 

^tx = Pt{x)atx (5.21) 

for some bounded process atx, to ensure that uJtx dies off appropriately for values 
of X such that pt{x) approaches zero. As a consequence, we can write (5.15) in 
the quasi-lognormal form 

= (n + dx \npt{x))dt + atx ■ dW;, (5.22) 

and for the constraint (5.17) we have 

Ep[atx] = 0, (5.23) 

which can be satisfied by writing 

<ytx = 'i^tx- Ep[vtx\, (5.24) 

where vtx is an exogenously specifiable unconstrained process. Here, for any pro- 
cess Atx we define -Ep[At2:] = pt{x)Atxdx. The results established above can 
then be summarised as follows. 

Proposition 2. The general admissible term structure evolution based on the 
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information set generated by a multidimensional Brownian motion Wf is given 
by a measure-valued process pt{x) in I?(R^) satisfying 

= {d. In pt{x) - Ep[d, \npt{x)]) dt 

+ {utx - ■ (dWt + Xtdt) , (5.25) 

where the processes \t and vtx are specified exogenously, along with the initial 
term structure density po{x). 

An advantage of the particular expression (5.25) given for the dynamics above 
is that the preservation of the normahsation condition on pt{x) is evident by 
inspection, because this is equivalent to the relation 

\ dptix) ] 

Ep — — =0. 5.26) 
L Pt{x) J 

An alternative expression for (5.25), which brings out more explicitly the nonlin- 
ear ities in the dynamics, is given by 

dpt{x) = {d.j,pt{x) + pt{Q)pt{x)) dt 

+Pt{x) (vtx - Pt{y)i^tydy^ ■ ^W*, (5.27) 
where dW^ = dWt + A^dt as defined earlier. 



6. Principal moment analysis 

The characterisation of the yield curve as an abstract probability density en- 
ables us to develop a rigourous analogue of the classical 'principal component' 
analysis often used in the study of yield curve dynamics. To this end we let 
Pt{x) = —dxPt,t+x be the density process associated with an admissible family of 
discount bond prices, and define the moment processes 



xt 

and 



j'OO 

= I xpt{x)dx (6.1) 
Jo 



poo 

/ x"pt(x)dx (6.2) 

JO 

for n > 2, along with the central moment processes 

4")= /'"(:c-xt)>t(x)dx. (6.3) 
Jo 

It is important to note that in some cases the relevant moments may not 
exist. For example, in the case of a continuously compounded flat yield curve 
given at t = by the density function Pq{x) = Re"^, we have Xq = R~^, 

x'^^ = R^^, x^Q^ = 3i?^^, and x'^^ = 9R~^ for the first four central moments. 
On the other hand, in the example of the simple flat term structure for which 
Po{x) = R/{1 + Rx)^ we find that none of the moments exist, on account of the 
fatness of the tail of the distribution. In fact, for the flat rate term structures 
with compounding frequency k the moments exist only up to order n — 1. 
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The first four moments, if they exist, are the mean, variance, skewness and 
kurtosis of the distribution of the abstract random variable X characterising the 
yield curve, and we refer to these (and other) moments as the 'principal moments' 
of the given term structure. At t = the mean xq determines a characteristic 
time-scale associated with the given term structure, and its inverse I/xq can be 

thought of as an associated characteristic yield. The difference x'q^ — (xq)^ then 
measures the departure of the given term structure from flatness on a continuously 
compounded basis. This is on account of the fact that in the case of an exponential 
distribution the variance is given by the square of the mean. 

It is legitimate to conjecture that for some purposes the specification of, e.g., the 
first three or four moments will be sufficient to provide an accurate representation 
of the term structure. One way of implementing this idea is to introduce the 
entropy Sp of the given distribution, defined by 



Because p{x) has dimensions of inverse time, Sp is defined only up to an overall 
additive constant. Therefore, the difference of the entropies associated with two 
yield curves has an invariant significance. 

For yield curve calibration we propose that p{x) should be chosen such that Sp 
is maximised subject to the constraints of the data available. For example, if we 
are given as data only the mean xq, then the maximum entropy term structure 
is po{x) = Re~^^, where R = I/xq. 

It is also of great interest to study the dynamics of the principal characteristics 
in the case of a general admissible arbitrage-free term structure. We examine 
here, in particular, the mean and the variance processes. For this purpose we 

-(2) 

introduce a simplified notation vt = xj. for the variance process, i.e.. 



where the mean process xt is given as in (6.1). We assume that both pt{x) and 
the discount bond volatility T,t,t+x fall off to zero sufficiently rapidly to ensure 
that liuix-^oo x"" pt{x) = and linix^oo x'^Tif^t+x = for n = 1,2, and that the 
integrals /q°° x"'pt{x)dx and x"'~^T,t^t+x(ix exist for n = 1, 2. A straightforward 
calculation then leads us to the following conclusion: 

Proposition 3. The first principal moment xt of an admissible, arbitrage- free 
term structure satisfies tlie dynamical law 




(6.4) 




(6.5) 



dxt = {rtxt-l)dt + f]t-dw;, 
where t,t = '^t,t+xdx. 

Proof. Starting with (5.22) and (6.1) we have 



(6.6) 
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by use of (5.9). Then, integrating by parts and using the assumed asymptotic 
behaviours for ptix) and St^t+x, we obtain the desired result. ^ 
We note that there is a critical level x* for the first principal moment given by 

x; = ^^{l-Xftt). (6.8) 

When xt > x^ the drift of xt is positive, and the drift increases further as x 
increases. On the other hand, when xt < x^, the drift of xt is negative, and the 
drift decreases further as xt decreases. For the variance process, we have: 

Proposition 4. The second principal moment vt of an admissible, arbitrage- 
free term structure satisfies the dynamical law 

dvt = {rtivt - x1) - S?) dt + 2 (s;') - xt%) ■ dW^ (6.9) 

where = xllt,t+xdx. 

Proof. Starting with formula (6.5) for vt we have 

roo 

dvt= / x'^dpt{x)dx-d{x1). (6.10) 
Jo 

For the first term we obtain 

roo / roo roc \ 

J x^ pt{x)dx = irt J x'^pt{x)dx + j x^dxPtix)dx] dt 

-dw-;, (6.11) 

where we have used (5.9) and (5.22). As a consequence of the assumed asymptotic 
behaviour of pt{x) and this becomes 

J x'^pt{x)dx = (nxf ^ - 2xt) dt + 2Ef ^ • dW^*, (6.12) 

after an integration by parts. For the second term in (6.10) we have 

d{x^t) = 2xtdxt + {dxtf (6.13) 

by Ito's lemma, and thus 

d(xf ) = (2rtx'^i - 2xt + dt + 2xtSt • dW^ (6.14) 

by use of Proposition 3. Combining (6.12) and (6.14), and using the definition 
(6.5) we obtain (6.9). 
In this case we recall that the difference vt — acts as a simple measure 
of the extent to which the distribution deviates from the 'flat' term structure. 
As a consequence we see that the effect of the dynamics here is that the second 
principal moment of the term structure tends to increase, i.e., has a positive drift, 
providing vt — x| is already above the level given by 

Vt - ^? = ^ - 2At • (Sf ) - xttt)) . (6.15) 
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7. Hilbert space dynamics for term structures 

Now that we have examined some of the advantages of expressing the arbitrage- 
free interest rate term structure dynamics as a randomly evolving density func- 
tion, let us consider how we transform to the Hilbert space representation for 
density functions considered in §2. Denote by ^tx the process for the square-root 
likelihood function, defined by 

Ptix)=Cl. (7.1) 
It follows then, by Ito's lemma, that 

dptix) = 2^txd^tx + {d^txf, (7.2) 
and hence {dpt{x))^ = 4^te(d^te)^- By rearranging (7.2) we thus obtain 

d^tx = ^dpi(x) - ^{dptix)f (7.3) 
for the dynamics of the process ^tx, and hence 

d^tx = (d^^tx + In^tx - ^^tx) dt + -^utx ■ dW*, (7.4) 
where = ojfx ■ oJtx- Now suppose we define atx by the quotient 

c^tx = (7.5) 

as before, and set a^^ = atx • <^tx- Then the process for the square-root density ^tx 
can be written in the form 

d^tx = (dx^tx + ^n^tx - ktx(Ttx) dt + ^CtxTtx ■ dWt*. (7.6) 



We recall that the volatility process atx arising again in this connection, which is 
given more explicitly by the ratio 

atx = (7.7) 

can be specified exogenously, subject only to the condition that it has mean zero 
in the measure pt{x), which implies that atx can be written in the form (5.24). 

We would now like to interpret the Hilbert space dynamics in equation (7.6) 
more directly in a geometrical fashion. For this purpose we find it expedient to in- 
troduce an index notation, using Greek letters to signify Hilbert space operations 
(cf. Brody & Hughston 1998). 

Thus if the function 'ijj(x) is an element of H = L'^(R\), we denote it by ■0", and 
if (p{x) belongs to the dual Hilbert space T-L* we denote this by (pa- Furthermore, 
their inner product is written 

POD 

i^'^cpa = / ij{x)ip{x)dx. (7.8) 

There is a preferred symmetric quadratic form g^^ on H, given by ga0'4''^'4'^ = 
/Q°°('0(x))^dx, which thus establishes an isomorphism between % and %* ^ given 
by i/j" ipa = gap'ip^ ■ Intuitively, one can think of gap as corresponding to the 
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Figure 3. Interest rate dynamics. At each instant of time the interest rate term structure can 
be represented as a point on the positive orthant of the unit sphere S in the Hilbert space 
'H = L^(R]^). The associated arbitrage- free interest rate dynamics gives rise to a stochastic 
trajectory on this space, which is foUated by hypersurfaces corresponding to level values of the 
short-term interest rate. 



delta function 5{x,y), and then we have 

POO 

5a/3V'"/= / ij{x)5{x,y)ip{y)dxdy. (7.9) 





There are a number of Hilbert space technicalities that have to be considered for 
a complete exposition of the matter, but that is not our immediate concern. 

If ^{x) > belongs to the positive orthant of L^(R5,_) then the corresponding 
indexed quantity has the interpretation of a 'state vector'. In that case we 
can think of symmetric quadratic forms as representing certain classes of random 
variables. The expectation of the random variable Haj^ in the state ^" is 

Em = ^^fg^. (7.10) 

Therefore, a state vector determines a mapping from random variables to real 
numbers, through (7.10). For a normalised state vector we have ^o^" = 1, al- 
though for some purposes it is convenient to relax the normalisation condition. 
In particular, we notice that the expectation (7.10) only depends on the direction 
of 

Now suppose ^ (x) is a positive function. In that case, the derivative dx can be 
thought of as a linear operator Z)"^ on 7i, and we have an endomorphism given 

by ?° — D'^pi^ ■ By making use of this, we can now interpret, in the language of 
Hilbert space geometry, the first two terms appearing in the drift in the dynamical 
equation (7.6). 

Let us begin by noting first that (5.16) can be rewritten in the form 

' 1 

itxdxitxdx = --rt. (7.11) 
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This allows us to interpret the short term interest rate process rt in terms of the 
mean of the symmetric part of the operator L)"^ in the state 4"; i-e., 

where -Dq,^ = ga-yD'^i^. Therefore, if we let D(ai3) denote the symmetric part of the 
operator L)"^ , then the abstract random variable in Ti corresponding to the short 
rate rt is given by r^^ = — 20(^^/3) ■ Similarly we can represent the abstract random 
variable x for the time left until maturity in "^^ by a symmetric matrix X^^. It 
is interesting to note that the random variables Xaj3 for the maturity date and 
Tq,^ for the short term interest rate are not 'compatible'. Two random variables 
A and B are said to be compatible if the expression {{A,C}, B} — {A, {C, B}} 
vanishes for any random variable C, where {A, B} = AB + BA denotes the 
anticommutator (Segal 1947). The lack of compatibility here indicates that the 
abstract probability system containing both r^/j and Xa0 as random variables is 
not Kolmogorovian. However, the algebra of random variables generated by Xq,^ 
is Kolmogorovian. 

Now, let rj{x) be an arbitrary clement of L^(R3|_), and let rj" be the correspond- 
ing Hilbert space vector. Then clearly we have 



f 

Jo 



V{x) dx^tx + hn^tx dx = rja 



(7.13) 



In other words, the first two terms of the drift in (7.6) can be replaced by the 
expression , where 

= - (^^] (7-14) 



where 6"^ is the Kronecker delta. Clearly, we have Da^^"'^^ = 0. 

With this in mind, let us now proceed to the interpretation of the volatility 
process atx- Again, atx has the character of a linear operator acting on ^tx^ subject 
to the constraint Ep[atx\ = 0. This can be consistently enforced if there exists a 
symmetric process utajS such that 

^ ?7(x)CteO-tedx = r?" (fto/sCf - E^[^t]ito) ■ (7.15) 

The symmetric operator- valued random process Vtafii whose existence is thus 
implied, is 'primitive' in the sense that it is unconstrained and can be specified 
exogenously. If we write 

<7"ta/3 = Vtap - E^[vt]gap, (7.16) 

we obtain 



and also 



^ 'n{x)(,t^atAx = rjo^a^fi^l , (7.17) 
^ ry(a;)^tecj|^dx = r^a^^af^CZ- (7-18) 
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Therefore, putting the various ingredients together, we obtain: 

Proposition 5. The dynamics of the HiJhert space vector that characterises 
the term structure in an admissible, arbitrage-free interest rate framework is 
governed by the stochastic differential equation 

d^f = (i)"/, - • ct7^) ef dt + iaf^^f • (dWt + Xtdt) , (7.19) 

where D"-^ is given as in (7.14), and the adapted operator-valued process atap is 
expressible in the form 

(^tal3 = l^taP - j (7.20) 

where Vfa^ is an arbitrary adapted operator-valued process. 

This rcsuh shows that the evolution of the yield curve can be viewed consis- 
tently as a process on the positive orthant of the unit sphere in Hilbert space, 
and thus gives rise to an entirely new way of understanding the dynamics of the 
term structure. The purpose of the quadratic term in the drift of (7.19) is to keep 
the process on the sphere, and in the absence of the term involving the operator 
D'^p we would have a general local martingale on the sphere S with respect to 
the risk-neutral measure, where the martingale property on S is characterised in 
a standard way by use of the techniques of stochastic differential geometry (see, 
e.g., Emery 1989, Ikeda & Watanabe 1989, Hughston 1996). The term involving 
the operator Z)"^ splits into a symmetric and an antisymmetric part. The drift 
generated by the antisymmetric part of Z)"^ is generated by a symmetry of the 
sphere S. The drift generated by the symmetric part of -D"^, on the other hand, 
is a negative gradient vector field orthogonal to surfaces in S generated by level 
values of the short rate r^. This term therefore creates a tendency for the vector 
to drift towards a lower interest rate, a property of the negative gradient field 
which is then counterbalanced by the effects of the diffusive term. 
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